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1 Run-time adaptation in river 100% 
12 Remzi H. Arpaci-Dusseau 

ACM Transactions on Computer Systems (TOCS) February 2003 

Volume 21 Issue 1 

We present the design, implementation, and evaluation of run-time adaptation within the River 
dataflow programming environment. The goal of the River system is to provide adaptive 
mechanisms that allow database query-processing applications to cope with performance 
variations that are common in cluster platforms. We describe the system and its basic 
mechanisms, and carefully evaluate those mechanisms and their effectiveness. In our analysis, 
we answer four previously unanswered and important que ... 



2 Performance of the CRAY T3E multiprocessor 100% 
13 Ed Anderson , Jeff Brooks , Charles Grassl , Steve Scott 

Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) 

November 1997 

The CRAY T3E is a scalable shared-memory multiprocessor based on the DEC Alpha 2 1 1 64 
microprocessor. The system includes a nimiber of novel architectural features designed to 
tolerate latency, enhance scalability, and deliver high performance on scientific and 
engineering codes. Included among these are stream buffers, which detect and prefetch down 
small-stride reference streams, E-registers, which provide latency hiding and non-unit-stride 
access capabilities, barrier and fetch_an ... 



3 User-space communication: a quantitative study 100% 
13 Soichiro Araki , Angelos Bilas , Cezary Dubnicki , Jan Edler , Koichi Konishi , James Philbin 

Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM) 

November 1998 
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Powerful commodity systems and networks offer a promising direction for high performance 
computing because they are inexpensive and they closely track technology progress. However, 
high, raw-hardware performance is rarely delivered to the end user. Previous work has shown 
that the bottleneck in these architectures is the overheads imposed by the software 
communication layer. To reduce these overheads, researchers have proposed a number of 
user-space communication models. The common featur ... 



4 A CRT editing svstem 100% 
3 Edgar T. Irons , Frans M. Djorup 

Communications of the ACM January 1 972 

Volume 15 Issue 1 

A text-editing and manipulation program is described. The program operates from low-cost 
cathode-ray tube entry and display stations with keyboard and 13 function buttons. 
Applications, potential economy of operation, and some aspects of implementation are 
discussed. 



5 SPARK: a benchmark package for sparse computations . 100% 
13 Youcef Saad , Harry A. G. Wijshoff 

ACM SIGARCH Computer Architecture News , Proceedings of the 4th international 

conference on Supercomputing June 1990 

Volume 18 Issue 3 

As the diversity of novel architectures expands rapidly there is a growing interest in studying 
the behavior of these architectures for computations arising in different applications. There has 
been significant efforts in evaluating the performance of supercomputers on typical dense 
computations, and several packages for this purpose have been developed, such as the Unpack 
benchmark, the Lawrence Livermore Loops, and the Los Alamos Kemels. On the other hand 
there has been Uttle effort pu ... 



6 Data relocation and prefetching for programs with large data sets 100% 
13 Yoji Yamada , John Gyllenhall , Grant Haab , Wen-mei Hwu 

Proceedings of the 27th annual international symposium on Microarchitecture November 

1994 

Numerical applications frequently contain nested loop structures that process large arrays of 
data. The execution of these loop structures often produces memory reference pattems that 
poorly utilize data caches. Limited associativity and cache capacity result in cache conflict 
misses. Also, non-unit stride access pattems can cause low utilization of cache lines. Data 
copying has been proposed and investigated in order to reduce cache conflict misses, but this 
technique has a high executio ... 



7 Architecture: Leveraging cache coherence in active memorv svstems 99% 

(3 Daehyun Kim , Mainak Chaudhuri , Mark Heinrich 

Proceedings of the 16th international conference on Supercomputing June 2002 
Active memory systems help processors overcome the memory wall when applications exhibit 
poor cache behavior. They consist of either active memory elements that perform data parallel 
computations in the memory system itself, or an active memory controller that supports 
address re-mapping techniques that improve data locality. Both active memory approaches 
create coherence problems— even on uniprocessor systems— since there are either additional 
processors operating on the data directly, or the ... 
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8 Embedding Linux in a Commercial Product: A look at embedded systems and what it takes to 99% 
build one 

Joel R. Williams 

Linux Journal October 1999 

9 ENWRICH: a compute-processor write caching scheme for parallel file systems 99% 
13 Apratim Purakayastha , Carla Schlatter Ellis , David Kotz 

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of 
the federated computing research conference May 1996 

1 0 Comparison of Raw and Internet protocols in a HIPPI/ATM/SONET based rieabit network 99% 
S Raj K. Singh , Stephen G. Tell , Shaun J, Bharrat 

ACM SIGCOMM Computer Communication Review January 1996 
Volume 26 Issue 1 

We compare implementation of Raw and Intemet protocols (TCP, UDP) on a programmable 
HIPPI host-interface called the Network Interface Unit. The network interface unit connects 
Pixel-Planes 5, a message-based graphics multicomputer, to a wide area gigabit network called 
VISTAnet. The BISDN network consists of a SONET cross-connect switch and an ATM 
switch. We discuss the tradeoffs between protocols for our target application and present a 
comparison of end-to-end throughput based on empirical me ... 

11 A faster UDP 99% 
13 Craig Partridge , Stephen Pink 

IEEE/ACM Transactions on Networking (TON) August 1993 
Volume 1 Issue 4 

12 Letters to the editor: Letters to the editor 99% 

13 Communications of the ACM June 1964 
Volume 7 Issue 6 

13 Virtual database technology 99% 
13 Ashish Gupta , Venky Harinarayan , Anand Rajaraman 

ACM SIGMOD Record December 1997 

Volume 26 Issue 4 



14 Distributed storage control unit for the Hitachi S-3800 multivector supercomputer 99% 
13 Katsuyoshi Kitai , Tadaaki Isobe , Tadayuki Sakakibara , Shigeko Yazawa , Yoshiko Tamaki , 
Teruo Tanaka , Kouichi Ishii 

Proceedings of the 8th international conference on Supercomputing July 1994 

This paper discusses the storage control unit of the Hitachi S-3800 supercomputer series, 
which is capable of achieving 8 GFLOPS in each of up to four shared-memory 
multiprocessors. This storage control unit is distributed to the V-SCs (vector-processor-side 
storage control miits) and the M-SCs (main-storage-side storage control units), and achieves 
128 gigabytes per second of total memory throughput. This distributed storage control unit 
supports scalability with increases in the number of ... 
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15 VMTP: a transport protocol for the next generation of communication systems 99% 
S D Cheriton 

Proceedings of the ACM SIGCOMM conference on Communications architectures & 
protocols September 1986 

The Versatile Message Transaction Protocol (VMTP) is a transport-level protocol designed to 
support remote procedure call, multicast and real-time communication. The protocol is 
optimized for efficient page-level network file access in particular. In this paper, we describe 
the significant aspects of the VMTP design, including the VMTP treatment of sessions, 
addressing, duplicate suppression, flow control and retransmissions plus its provision for 
multicast. The VMTP design refle ... 

16 Fast and flexible application-level networking on exokemel systems ' 98% 
13 Gregory R. Ganger , Dawson R. Engler , M. Frans Kaashoek , Hector M. Briceno , Russell 

Hunt , Thomas Pinckney 

ACM Transactions on Computer Systems (TOCS) February 2002 
Volume 20 Issue 1 

Application-level networking is a promising software organization for improving performance 
and fimctionality for important network services. The Xok/ExOS exokemel system includes 
application-level support for standard network services, while at the same time allowing 
application writers to specialize networking services. This paper describes how Xok/ExOS's 
kernel mechanisms and library operating system organization achieve this flexibility, and 
retrospectively shares bur experiences an ... 

17 Clustering: Evaluating document clustering for interactive information retrieval 98% 
13 Anton Leuski 

Proceedings of the tenth international conference on Information and knowledge 
management October 2001 

We consider the problem of organizing and browsing the top ranked portion of the documents 
returned by an information retrieval system. We study the effectiveness of a document 
organization in helping a user to locate the relevant material among the retrieved documents as 
quickly as possible. In this context we examine a set of clustering algorithms and 
experimentally show that a clustermg of the retrieved documents can be significantly more 
effective than traditional ranked Hst approach. We a ... 



18 Memory access scheduling 98% 
13 Scott Rixner , William J. Dally , Ujval J. Kapasi , Peter Mattson , John D. Owens 

ACM SIGARCH Computer Architecture News , Proceedings of the 27th annual 

international symposium on Computer architecture May 2000 

Volume 28 Issue 2 

The bandwidth and latency of a memory system are strongly dependent on the manner in 
which accesses interact with the “3-D” structure of banks, rows, and columns 
characteristic of contemporary DRAM chips. There is nearly an order of magnitude difference 
in bandwidth between successive references to different columns within a row and different 
rows within a bank. This paper introduces memory access scheduling, a technique that 
improves the performance of . . . 



19 Exploiting CLP in page-based intelligent memorv 98% 
(3 Mark Oskin , Justin Hensley , Diana Keen , Frederic T. Chong , Matthew Farrens , Aneet 
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Chopra 

Proceedings of the 32iid annual ACM/IEEE international symposium on 
Microarchitecture November 1999 

This study compares the speed, area, and power of different implementations of Active Pages 
[OCS98], an intelUgent memory system which helps bridge the growing gap between 
processor and memory performance by associating simple functions with each page of data. 
Previous investigations have shown up to lOOOX speedups using a block of reconfigurable 
logic to implement these functions next to each sub-array on a DRAM chip. In this study, we 
show that instruction-level parallelism, n ... 

20 Disco: running commodity operating systems on scalable multiprocessors 98% 
l^- Edouard Bugnion , Scott Devine , Kinshuk Govil , Mendel Rosenblum 

ACM Transactions on Computer Systems (TOCS) November 1997 

Volume 15 Issue 4 

In this article we examine the problem of extending modem operating systems to run 
efficiently on large-scale shared-memory multiprocessors without a large implementation 
effort. Our approach brings back an idea popular in the 1970s: virtual machine monitors. We 
use virtual machines to run multiple commodity operating systems on a scalable 
multiprocessor. This solution addresses many of the challenges facing the system software for 
these machines. We demonstrate our approach with a prototy ... 
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